With modern trends aimed at total listening and collecting all kinds of information, the use of secure means of communication is more relevant than ever. Encrypting the transmitted data itself only partially solves the problem, since the very fact of information exchange between participants is more important than its content.
In most modern systems, be it email, ICQ or Twitter, the owner of the servers has all this data and can, if necessary, share it when receiving a formal or informal request for it. Below is a project of a network built on top of I2P, in which the owner uses his nodes only to ensure more stable operation and as gateways to the regular Internet, having no more information than ordinary I2P nodes.
Let's consider the mechanisms for ensuring anonymity and confidentiality of I2P, which we will rely on to build our network:
Previously it was shown that I2P is actually a two-layer design: a router that provides communication with other routers and tunnels, and protocols designed to transfer data between applications. If router protocols seem carefully thought out and effective, then application protocols leave much to be desired and are a jumble of different concepts and ideas, driven by the desire to make them as universal as possible and “transparent” for existing applications. In our case, the task is significantly simplified, since we assume exchange between our clients, so we can use our own protocol.
Another problem with I2P is that when trying to access an address, an “address not found” error occurs, although the resource with the specified address is currently online. This happens due to the incompleteness of the network database, for example, immediately after the start, when information about many routers becomes outdated and time is required to update it. And since addresses publish their LeaseSets on the floodfills “closest” to them, the client may simply not have the necessary floodfills in the database yet. Our clients will use a second network database containing a set of nodes corresponding to our servers and publish their LeaseSets only on these nodes, allowing them to find each other's LeaseSets immediately.
Each I2P node is identified by an I2P address, which is 2 pairs of public and private keys, randomly generated at the time the node is created, without any correlation with the IP address or location. There is no central source of addresses; it is assumed that the probability of two randomly generated addresses matching is negligible. The owner of the node is the one who has the file with the complete set of keys. Two public keys and a 3-byte certificate (currently always null) form a 387-byte node ID, by which the node becomes known in I2P. Since the full 387-byte identifier is quite inefficient for comparing, sorting and transferring data, the 32-byte SHA-256 hash of the identifier is used to identify the node, which we use to identify the client. Since the address contains the signature key, it will be difficult for an attacker to impersonate another client; this is equivalent to selecting a pair of keys whose hash will correspond to the given identifier. If necessary, the client can confirm that it is he who is hiding behind the I2P address by signing a document with his key.
So, our network will consist of clients of our network running on computers and servers belonging to us. Both clients and servers are full-fledged I2P routers, while the servers are declared high-speed and are designed primarily to pass transit traffic, while clients mainly use their own tunnels, and transit traffic to mask their activity. Information about servers is public and known to clients, but servers do not know anything about clients and have no way to distinguish clients from regular I2P nodes. Clients will select nodes for tunnels so that there is exactly one server in the tunnel, and the remaining nodes belong to other participants in regular I2P. Even if all our servers are under the control of an attacker, one node will not be enough to determine the other end of the standard 3-step tunnel for I2P. The user will always have the opportunity to see tunnel routes, as well as exclude suspicious nodes.
On the other hand, one of our servers in the tunnel is necessary to increase the reliability of the tunnels through the early detection of tunnels that have stopped working. This is one of the fundamental problems of I2P: if a node agreed to participate in a transit tunnel, and then stopped working (for example, the user stopped it), then the tunnel creator knows nothing about it and continues to use the broken tunnel for a long time. Unlike regular I2P, our clients will actively send test messages into the tunnel, and as soon as our server detects a lack of traffic in the tunnel, it will publish a notification to clients about this, thereby allowing the client to stop using such a tunnel immediately.
To exchange data between our clients, I2NP message type 20 - Data, containing arbitrary data, or message type 11 - Garlic can be used. Initially, I2P assumed the following exchange scheme between addresses: it was necessary to request the LeaseSet of the recipient, then a Garlic type message should be generated, indicating the address as the destination, encrypt it with the public encryption key from the LeaseSet and send it to the appropriate tunnel. The router, upon receiving such a message, decrypted it and further determined who the message was intended for. But in this case, the encryption key had to be the same for all addresses sitting on a given router, which created a large “hole” in security, therefore, in the modern implementation of I2P, each address has its own set of incoming tunnels and an encryption key, accordingly the router can determine the address and without the "garlic" message. By not using garlic encryption, we get rid of yet another cumbersome I2P design - the AES/ElGamal engine, and can use encryption that is more efficient for our purposes, while at the same time sending type 11 messages to make our traffic indistinguishable from regular I2P.
Clients can exchange mail both among themselves within the network and with external recipients. In the first case, I2P addresses are used directly, and messages are sent through tunnels from the LeaseSet of the recipient. If the client cannot detect a LeaseSet with such an address, it will continue to do so for a certain time, after which it will generate an undeliverable message..
In the second case, the client should use one of our servers as an outgoing SMTP server. Each of our servers will have its own address, and the client's address will correspond to the username assigned by the server, together forming a valid mailing address. If a client wants to send a mail message outside the network, he must find the server's LeaseSet (and it will definitely find it), after which the server will recognize the message as mail and send it to the recipient as a regular SMTP server. The recipient will only know the addresses of our SMTP server, and even if someone wants to find out from us who is hiding behind this or that address, the most we can tell is the I2P address, and we still don’t know whose address it is. If the server receives a message from the outside, it uses the user’s name to find its I2P address and then sends it in the usual way within our network.
In order to combat spam, we will introduce restrictions on the number of messages sent from each I2P address. In order for an address to send messages outside, it will have to register on the server and find out its name, and we will require a certificate from it, resulting from some resource-intensive computing task, thereby complicating the mass generation of addresses, while at the same time not creating problems for those who you only need one or more addresses.
Thus, we get a network that, on the one hand, ensures anonymity and confidentiality of transmitted information, the disclosure of which is impossible without access to the client’s computer, and on the other hand, maintains a high level of trust between clients using cryptographic identification tools. Using your own protocol and only it between clients can significantly simplify the implementation and increase the reliability of the network, while the emergence of new high-speed routers will improve the operation and throughput of I2P itself.
I would like to hear the opinion of the respected habr community about the proposed project as a whole, and first of all about potential attacks with the aim of de-anonymizing clients, as well as other weaknesses and vulnerabilities.
In most modern systems, be it email, ICQ or Twitter, the owner of the servers has all this data and can, if necessary, share it when receiving a formal or informal request for it. Below is a project of a network built on top of I2P, in which the owner uses his nodes only to ensure more stable operation and as gateways to the regular Internet, having no more information than ordinary I2P nodes.
Let's consider the mechanisms for ensuring anonymity and confidentiality of I2P, which we will rely on to build our network:
- Each I2P participant is a router known to the rest of the network and one or more addresses that form the actual “invisible” network. The meaning of I2P is the practical impossibility of finding out on which router a particular address is located
- An I2P address is a public key pair for asymmetric encryption and signing. The private key pair is stored by the owner and is proof of the authenticity of the address. In other words, for authorization, instead of passwords, this file with keys is used - an analogue of an electronic digital signature, which can, if necessary, be implemented in the form of a token
- Connections between routers are encrypted using AES, the session key for which is negotiated in several steps, including verification of the host address signature to counter man-in-the-middle attacks.»
Previously it was shown that I2P is actually a two-layer design: a router that provides communication with other routers and tunnels, and protocols designed to transfer data between applications. If router protocols seem carefully thought out and effective, then application protocols leave much to be desired and are a jumble of different concepts and ideas, driven by the desire to make them as universal as possible and “transparent” for existing applications. In our case, the task is significantly simplified, since we assume exchange between our clients, so we can use our own protocol.
Another problem with I2P is that when trying to access an address, an “address not found” error occurs, although the resource with the specified address is currently online. This happens due to the incompleteness of the network database, for example, immediately after the start, when information about many routers becomes outdated and time is required to update it. And since addresses publish their LeaseSets on the floodfills “closest” to them, the client may simply not have the necessary floodfills in the database yet. Our clients will use a second network database containing a set of nodes corresponding to our servers and publish their LeaseSets only on these nodes, allowing them to find each other's LeaseSets immediately.
Each I2P node is identified by an I2P address, which is 2 pairs of public and private keys, randomly generated at the time the node is created, without any correlation with the IP address or location. There is no central source of addresses; it is assumed that the probability of two randomly generated addresses matching is negligible. The owner of the node is the one who has the file with the complete set of keys. Two public keys and a 3-byte certificate (currently always null) form a 387-byte node ID, by which the node becomes known in I2P. Since the full 387-byte identifier is quite inefficient for comparing, sorting and transferring data, the 32-byte SHA-256 hash of the identifier is used to identify the node, which we use to identify the client. Since the address contains the signature key, it will be difficult for an attacker to impersonate another client; this is equivalent to selecting a pair of keys whose hash will correspond to the given identifier. If necessary, the client can confirm that it is he who is hiding behind the I2P address by signing a document with his key.
So, our network will consist of clients of our network running on computers and servers belonging to us. Both clients and servers are full-fledged I2P routers, while the servers are declared high-speed and are designed primarily to pass transit traffic, while clients mainly use their own tunnels, and transit traffic to mask their activity. Information about servers is public and known to clients, but servers do not know anything about clients and have no way to distinguish clients from regular I2P nodes. Clients will select nodes for tunnels so that there is exactly one server in the tunnel, and the remaining nodes belong to other participants in regular I2P. Even if all our servers are under the control of an attacker, one node will not be enough to determine the other end of the standard 3-step tunnel for I2P. The user will always have the opportunity to see tunnel routes, as well as exclude suspicious nodes.
On the other hand, one of our servers in the tunnel is necessary to increase the reliability of the tunnels through the early detection of tunnels that have stopped working. This is one of the fundamental problems of I2P: if a node agreed to participate in a transit tunnel, and then stopped working (for example, the user stopped it), then the tunnel creator knows nothing about it and continues to use the broken tunnel for a long time. Unlike regular I2P, our clients will actively send test messages into the tunnel, and as soon as our server detects a lack of traffic in the tunnel, it will publish a notification to clients about this, thereby allowing the client to stop using such a tunnel immediately.
To exchange data between our clients, I2NP message type 20 - Data, containing arbitrary data, or message type 11 - Garlic can be used. Initially, I2P assumed the following exchange scheme between addresses: it was necessary to request the LeaseSet of the recipient, then a Garlic type message should be generated, indicating the address as the destination, encrypt it with the public encryption key from the LeaseSet and send it to the appropriate tunnel. The router, upon receiving such a message, decrypted it and further determined who the message was intended for. But in this case, the encryption key had to be the same for all addresses sitting on a given router, which created a large “hole” in security, therefore, in the modern implementation of I2P, each address has its own set of incoming tunnels and an encryption key, accordingly the router can determine the address and without the "garlic" message. By not using garlic encryption, we get rid of yet another cumbersome I2P design - the AES/ElGamal engine, and can use encryption that is more efficient for our purposes, while at the same time sending type 11 messages to make our traffic indistinguishable from regular I2P.
Clients can exchange mail both among themselves within the network and with external recipients. In the first case, I2P addresses are used directly, and messages are sent through tunnels from the LeaseSet of the recipient. If the client cannot detect a LeaseSet with such an address, it will continue to do so for a certain time, after which it will generate an undeliverable message..
In the second case, the client should use one of our servers as an outgoing SMTP server. Each of our servers will have its own address, and the client's address will correspond to the username assigned by the server, together forming a valid mailing address. If a client wants to send a mail message outside the network, he must find the server's LeaseSet (and it will definitely find it), after which the server will recognize the message as mail and send it to the recipient as a regular SMTP server. The recipient will only know the addresses of our SMTP server, and even if someone wants to find out from us who is hiding behind this or that address, the most we can tell is the I2P address, and we still don’t know whose address it is. If the server receives a message from the outside, it uses the user’s name to find its I2P address and then sends it in the usual way within our network.
In order to combat spam, we will introduce restrictions on the number of messages sent from each I2P address. In order for an address to send messages outside, it will have to register on the server and find out its name, and we will require a certificate from it, resulting from some resource-intensive computing task, thereby complicating the mass generation of addresses, while at the same time not creating problems for those who you only need one or more addresses.
Thus, we get a network that, on the one hand, ensures anonymity and confidentiality of transmitted information, the disclosure of which is impossible without access to the client’s computer, and on the other hand, maintains a high level of trust between clients using cryptographic identification tools. Using your own protocol and only it between clients can significantly simplify the implementation and increase the reliability of the network, while the emergence of new high-speed routers will improve the operation and throughput of I2P itself.
I would like to hear the opinion of the respected habr community about the proposed project as a whole, and first of all about potential attacks with the aim of de-anonymizing clients, as well as other weaknesses and vulnerabilities.